{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import data from Google Cloud Storage (GCS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's convenient and secure to host data in the cloud for data labeling, then sync task references to Label Studio to allow data annotators to view and label the tasks without your data leaving the secure cloud bucket. \n",
    "\n",
    "If your data is hosted in Google Cloud Storage (GCS), you can write a Python script to continuously sync data from the bucket with Label Studio. Follow this example to see how to do that with the [Label Studio SDK](https://labelstud.io/sdk/index.html). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get your GCS bucket ready"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connect to your GCS bucket and create a list of task references that Label Studio can use, based on the contents of your bucket. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "BUCKET_NAME = 'your-bucket'  # specify your bucket name here\n",
    "PREFIX = 'bucket/subfolder'  # specify your prefix here\n",
    "# assuming you service account key is stored in GOOGLE_APPLICATION_CREDENTIALS\n",
    "GOOGLE_APPLICATION_CREDENTIALS_FILE = os.getenv('GOOGLE_APPLICATION_CREDENTIALS') \n",
    "with open(GOOGLE_APPLICATION_CREDENTIALS_FILE) as f:\n",
    "    credentials = f.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a Label Studio Project"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connect to the Label Studio API with your personal API key, which you can retrieve from your user account page, and confirm you can successfully connect:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from label_studio_sdk.client import LabelStudio\n",
    "LABEL_STUDIO_URL = 'http://localhost:8080'\n",
    "API_KEY = '27c982caa9e599c9d81b25b00663e7d4f82c9e3c'\n",
    "\n",
    "ls = LabelStudio(base_url=LABEL_STUDIO_URL, api_key=API_KEY)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the project. In this example, the project is a basic [image object detection project](https://labelstud.io/templates/image_bbox.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "project = ls.projects.create(\n",
    "    title='Using Images from GCS',\n",
    "    label_config='''\n",
    "    <View>\n",
    "        <Image name=\"image\" value=\"$image\"/>\n",
    "        <RectangleLabels name=\"objects\" toName=\"image\">\n",
    "            <Label value=\"Airplane\"/>\n",
    "            <Label value=\"Car\"/>\n",
    "        </RectangleLabels>\n",
    "    </View>\n",
    "    '''\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to your GCS bucket"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connect your newly-created project to your GCS bucket:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "storage = ls.import_storage.gcs.create(\n",
    "    project=project.id,\n",
    "    bucket=BUCKET_NAME,\n",
    "    prefix=PREFIX,\n",
    "    regex_filter='.*\\.jpg',\n",
    "    google_application_credentials=credentials,\n",
    "    use_blob_urls=True,\n",
    "    presign=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sync tasks from GCS to Label Studio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After connecting to your bucket, you can import your private GCS links to Label Studio. When opening in Label Studio interface, they're automatically presigned for security! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ls.import_storage.gcs.sync(id=storage.id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a few lines of code you assessed the data in your bucket, set up a new labeling project, and synced the tasks to the project. You can adapt this example to more easily create a data creation to data labeling pipeline."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
